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Abstract — This tutorial paper describes the problem of image 
reconstruction from interferometric data with a particular focus 
on the specific problems encountered at optical (visible/IR) 
wavelengths. The challenging issues in image reconstruction from 
interferometric data are introduced in the general framework 
of inverse problem approach. This framework is then used 
to describe existing image reconstruction algorithms in radio 
interferometry and the new methods specifically developed for 
optical interferometry. 

Index Terms - aperture synthesis, interferometry, image re- 
, construction, inverse problems, regularization. 
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I. Introduction 

Since the first multi-telescope optical interferometer [1], 
considerable technological improvements have been achieved. 
Optical (visible/IR) interferometers are now widely open to 
the astronomical community and provide means to obtain 
unique information from observed objects at very high angular 
resolution (sub-milliarcsecond). There are numerous astro- 
physical applications: stellar surfaces, environment of pre- 
main sequence or evolved stars, central regions of active 
galaxies, etc. See [2]-[4] for comprehensive reviews about 
optical interferometry and recent astrophysical results. As 
interferometers do not directly provide images, reconstruction 
methods are needed to fully exploit these instruments. This 
paper aims at reviewing image reconstruction algorithms in 
astronomical interferometry using a general framework to 
formally describe and compare the different methods. 

Multi-telescope interferometers provide sparse measure- 
ments of the Fourier transform of the brightness distribution of 
the observed objects (cf. Section |ll|. Hence the first problem in 
image reconstruction from interferometric data is to cope with 
voids in the sampled spatial frequencies. This can be tackled 
in the framework of inverse problem approach (cf. Section 
imi) . At optical wavelengths, additional problems arise due to 
the missing of part of Fourier phase information, and to the 
non-linearity of the direct model. These issues had led to the 
development of specific algorithms which can also be formally 
described in the same general framework (cf. Section Hvl). 



II. Interferometric Data 

The instantaneous output of an optical interferometer is the 
so-called complex visibility Vj^j^(t) of the fringes given by 
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the interferences of the monochromatic light from the ji-th 
and the j2-th telescopes at instant t [3]: 
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where is the Fourier transform of I (6), the brightness 
distribution of the observed object in angular direction 6, gj (t) 
is the complex amplitude throughput for the light from the j- 
th telescope and ^'jusl^) the spatial frequency sampled by 
the pair of telescopes (ji, j^) (see Fig. [T]): 
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with A the wavelength and Vj (t) the projected position of the 
j-th telescope on a plane perpendicular to the line of sight. 
These equations assume that the diameters of the telescopes 
are much smaller than their projected separation and that the 
object is an incoherent light source. An interferometer there- 
fore provides sparse measurements of the Fourier transform of 
the brightness distribution of the observed object. The top-left 
panel of Fig. [2] shows an example of the sampling of spatial 
frequencies by an interferometer. 

In practice, the complex visibility is measured during a finite 
exposure duration: 
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where ( )m denotes averaging during the m-th exposure and 
^3\j2,rn Stands for the errors due to noise and modeling 
approximations. The exposure duration is short enough to 
consider the projected baseline rj^{t) — rj^{t) as constant, 
thus: 
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with Uj,,j2,m = {l^h,j2{t))m - 



3i,32^^'m\ tm — {t)m the 

j2,m = {gjiity gj2{t))m the 



mean exposure time and Gj^ 
effective optical transfer function (OTF). The fast variations of 
the instantaneous OTF are mainly due to the random optical 
path differences (OPD) caused by the atmospheric turbulence. 
In long baseline interferometry, two telescopes are separated 
by more than the outer scale of the turbulence, hence their 
OPDs are independent. Furthermore, the exposure duration is 
much longer than the evolution time of the turbulence (a few 
10 ms) and averaging can be approximated by expectation: 

with E{}^ the expectation during the m-th exposure. Dur- 
ing this exposure, the phase of gj{t) is ^j(t) = 0j,m + 
(2 tt/A) dj(t) with (j)j^rri = {<l^j{t))m the static phase aberration 
and Sj (t) ~ A/'(0, cr|) the OPD which is a zero-mean Gaussian 
variable with the same standard deviation for all telescopes [5]. 
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Fig. 1. Geometrical layout of an interferometer. B is the projected baseline, 
6 is the view angle and 6 is the geometrical optical path difference which is 
compensated by the delay lines. 



For a given telescope, the amplitude and phase of the complex 
throughput can be assumed independent, hence: 

with gj^rn = \9j{tm)\ exp(-i (/)^-,^) and = (2 7r/A)2cr| the 
variance of the phase during an exposure. The OTF is finally: 
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At long wavelengths (radio), the phase variation during each 
exposure is small, hence Gj^j^^m — m9j2,'m 

^ 0. If 

some means to calibrate the gj^m'^ are available, then image 
reconstruction amounts to deconvolution (cf. Section [IIl|); oth- 
erwise, self-calibration (cf. Section Ull-Hb has been developed 
to jointly estimate the OTF and the brightness distribution of 
the object given the measured complex visibilities. 

At short wavelengths (optical), the phase variance exceeds 
a few squared radians and Gj^j^^rn — 0, hence the object's 
complex visibility cannot be directly measured. A first solution 
would be to compensate for the OPD errors in real time using 
fast delay lines. This solution however requires a bright refer- 
ence source in the vicinity of the observed object and dedicated 
instrumentation [6] that is currently in development and not yet 
available. An alternative solution consists in integrating non- 
linear estimators that are insensitive to telescope-wise phase 
errors. This requires high acquisition rates (about 1000 Hz in 
the near infrared) and involves special data processing but 
otherwise no special instrumentation. 

To overcome loss in visibility transmission due to fast 
varying OPD errors, current optical interferometers integrate 
the power spectrum (for ji 22)'- 

^jid2,rn = (I ^■1,^2(^)1 )m — Pji,m Pj2,m \^{'^ji,j2,rn)\ 1 (6) 

with pj^rn = {\9jit)\^)m the mean squared modulus of the 
complex throughput of the j-th telescope during the m-th 
exposure. By construction, the Pj,m's are insensitive to the 
phase errors and so is the power spectrum. Unlike that of 
the complex visibility, the transfer function pj-^^m Pj^ ,m of 
the power spectrum is not negligible. This transfer function 
can be estimated by simultaneous photometric calibration and, 
to compensate for remaining static effects, from the power 



spectrum of a reference source (a so-called calibrator). Hence 
the object power spectrum |i^(^'ji,j2,m)P can be measured by 



in spite of phase errors due to the turbulence. 



To obtain Fourier phase information (which is not provided 
by the power spectrum), the bispectrum of the complex 
visibilities is measured: 
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where ji, and js denote three different telescopes. As for 
the power spectrum, the transfer function Pj^^m Pj2,m Pj3,m of 
the bispectrum can be calibrated. Since this transfer function 
is real, it has no effect on the phase of the bispectrum (the 
so-called phase closure) which is equal to that of the object: 

P 31, 32, 33, m = ^^S(^ji,j2,j3,m) 

= arg(/(l/^-,,^2,m) H^32j3,m) i{iyj:,J,,m)) • (8) 

Some phase information is however missing. Indeed, from all 
the interferences between T telescopes (in a non-redundant 
configuration), T (T — l)/2 different spatial frequencies are 
sampled but the phase closure only yields (T — 1) (T — 2)/2 
linearly independent phase estimates [3]. The deficiency of 
phase information is most critical for a small number of 
telescopes. Whatever the number of telescopes is, at least the 
information of absolute position of the observed object is lost. 

In practice, obtaining the power spectrum and the bispec- 
trum involves measuring the instantaneous complex visibilities 
(that is, for a very short integration time compared to the 
evolution of the turbulence) and averaging their power spec- 
trum and bispectrum over the effective exposure time. Being 
non-linear functions of noisy variables, these quantities are 
biased but the biases are easy to remove [7], [8]. To simplify 
the description of the algorithms, we will consider that the 
de-biased and calibrated power spectrum and bispectrum are 
available as input data for image reconstruction, thus: 
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where Sf, ^ and Bf, . „ are zero-mean terms that 

31i32i"i' Jlij2ij3i"'' 

account for noise and model errors. Instead of the complex 
bispectrum data, we may consider the phase closure data: 
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where (p{iy) = arg(/(L/)) is the Fourier phase of the object 
brightness distribution, arc( ) wraps its argument in the range 
(-TT, +7r] and Pj[\j^j^^rn denotes the errors. 



HI. Imaging from Sparse Fourier Data 

We consider here the simplest problem of image reconstruc- 
tion given sparse Fourier coefficients (the complex visibilities) 
and first assuming that the OTF has been calibrated. 



A. Data and Image Models 

To simplify the notation, we introduce the data vector 
2/ G which collates all the measurements: y^^ = V^^^^^^^ 
with I (j 1 , j2 , m) to denote a one-to-one mapping between 
index I and triplet ( ji , j2 , m) . Long baseline interferometers 
provide data for a limited set C = {i^k}k=i,...,K of observed 
spatial frequencies. For each i/^, there is a non-empty set Bk 
of telescope pairs and exposures such that: 



(ji, j2,m) e Bk 



J2,m 



or equivalently: 



>S/c =^ {(ji, j2,m) G X r^2,m - r'ji,m = Az/fc} (12) 

with A and f the sets of apertures (telescopes or antennae) 
and exposure indexes, and Vj^rn = (^j(^))m the mean position 
of the j-th telescope during the m-th exposure. Introducing 
Bk and the set C of observed frequencies is a simple way to 
account for all possible cases (with or without redundancies, 
multiple data sets, observations from different interferometers, 
etc.). Note that, if every spatial frequency is only observed 
once, then L = K and we can use i = k. 

The image is a parametrized representation of the object 
brightness distribution. A very general description is given by 
a linear expansion: 
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where {bn{0)}n^i,...,N are basis functions and x G 
are the image parameters, for instance, the values of the 
image pixels, or wavelet coefficients. Given a grid of angular 
directions Q = {On}n=i,...,N and taking bn{0) = b{6 — On), 
a grid model is obtained: 
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Using an equispaced grid, the usual pixelized image represen- 
tation is obtained with pixel shape b{0). The function b{0) 
can also be used as a building-block for image reconstruction 
[9]. Alternatively, b{0) may be seen as the neat beam that sets 
the effective resolution of the image [10]. 

The size of the synthesized field of view and the image 
resolution must be chosen according to the extension of the 
observed object and to the resolution of the interferometer, see 
e.g. [10]. To avoid biases and rough approximations caused by 
the particular image model, the grid spacing A6> should be well 
beyond the limit imposed by the longest baseline: 

« (15) 

where 5max = '^^'^ji,j2,t kji(^) — ''"^2(^)1 1^ maximum 
projected separation between interfering telescopes. Oversam- 
pling by a factor of at least 2 is usually used and the pixel size 
is given by: < A/(4 5rnax)- To avoid aliasing and image 
truncation, the field of view must be chosen large enough and 



without forgetting that the reciprocal of the width of the field 
of view also sets the sampling step of the spatial frequencies. 

The model of the complex visibility at the observed spatial 
frequencies is: 
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where the coefficients of the matrix T G C^^^ are Tk^n = 
bn{^k) or Tk^n — b(yk)^~^'^^^'''^^ depending which model 
of Eq. ([T3l) or Eq. ([T4l) is used. The matrix T performs the 
Fourier transform of non-equispaced data, which is a very 
costly operation. This problem is not specific to interferometry, 
similar needs in crystallography, tomography and bio-medical 
imaging have led to the development of fast algorithms to 
approximate this operation [11]. For instance: 



T:^R F S, 



(17) 



where F G ^ ^ is the fast Fourier transform (EFT) operator, 
R G C^^^ is a linear operator to interpolate the discrete 
Fourier transform of the image = F • at the observed 
spatial frequencies and S is diagonal and compensates the field 
of view apodization (or spectral smoothing) caused by R. 

In radio astronomy a different technique called regridding 
[12], [13] is generally used, which consists in interpolating 
the data (not the model) onto the grid of discrete frequencies. 
The advantage is that, when there is a large number of 
measurements, the number of data points is reduced, which 
speeds up further computations. There are however a number 
of drawbacks to the regridding technique. First it is not 
possible to apply the technique to non-linear estimators such 
as the power spectrum and the bispectrum. Second, owing to 
the structure of the regridding operator, the regridded data are 
correlated even if the original data are not. These correlations 
are usually ignored in further processing and the pseudo- 
data are assumed to be independent, which results in a poor 
approximation of the real noise statistics. This can be a critical 
issue with low signal to noise data [14]. 

Putting all together, the direct model of the data is affine: 



y = A • 



(18) 



with e the error vector (e^ = K^/j2,m)' A = G • T the linear 
model operator and G G C^^ the OTF operator given by: 
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if ^ ~ (ji,j2,m) G Bk 
else. 



(19) 



Applying the pseudo-inverse T+ = S ^ • F ^ • R+ of T 
to the data yields the so-called dirty image (see Fig. [2]): 



(20) 



where e = T+ • e and H = T+ • G • T. Apart from the 
apodization, H essentially performs the convolution of the 
image by the dirty beam (see Fig. O. From Eq. (fTSl) and 
Eq. (|2Q1) , image reconstruction from interferometric data can 
be equivalently seen as a problem of interpolating missing 
Fourier coefficients or as a problem of deconvolution of the 
dirty map by the dirty beam [15]. 
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Fig. 2. Top left: (u, v) coverage. Top right: observed object. Bottom left: 
dirty beam. Bottom right: dirty image. Object model and {u, v) coverage are 
from the 2004' Beauty Contest [16]. 



B. Inverse Problem Approach 

Since many Fourier frequencies are not measured, fitting the 
data alone does not uniquely define the sought image. Such 
an ill-posed problem can be solved by an inverse problem 
approach [17] by imposing a priori constraints to select a 
unique image among all those which are consistent with 
the data. The requirements for the priors are that they must 
help to smoothly interpolate voids in the (u^v) coverage 
while avoiding high frequencies beyond the diffraction limit. 
Without loss of generality, we assume that these constraints are 
monitored by a penalty function /prior (^c) which measures the 
agreement of the image with the priors: the lower /prior (^c), the 
better the agreement. In inverse problem framework, /prior (a^) 
is termed as the regularization. Then the parameters ic+ of the 
image which best matches the priors while fitting the data are 
obtained by solving a constrained optimization problem: 

x+ = argmin /prior (ic) , subject to: A ■ x = y . (21) 

Other strict constraints may apply. For instance, assuming 
the image brightness distribution must be positive and normal- 
ized, the feasible set is: 



a: = {x eR^;x>0,^Xn = l} 



(22) 



where x > means: Vn, Xn > 0. Besides, due to noise and 
model approximations, there is some expected discrepancy 
between the model and actual data. As for the priors, the 
distance of the model to the data can be measured by a penalty 
function /data(^c)- We then require that, to be consistent with 
the data, an image must be such that /data(3^) < ^data where 
7/data is set according to the level of errors: 

x+ = argmin /prior (ic) , subject to: /data(ic) < ^data . (23) 

The Lagrangian of this constrained optimization problem can 
be written as: 

C{x; £) = /prior(a^) + £ (/data(a^) - ^data) (24) 



where £ is the Lagrange multiplier associated to the inequality 
constraint /data(^c) ^ ^data- If the constraint is active^ then 
^ > and /data(^c) = ^data [18]. Dropping the constant 
^data which docs not depend on x, the solution is obtained 
by solving either of the following problems: 

x+ = argmin{/prior(a^) + /data(a^)} 

= argmin/(ic; ji) , 

where 

/(a^; = /data(a^) + M fpv\ov{x) (25) 

is the penalty function and /i = 1/^ > has to be tuned 
to match the constraint /data(^c) = ^data- Hence we can 
equivalently consider that we are solving the problem of 
maximizing the agreement of the model with the data subject 
to the constraint that the priors be below a preset level: 

= argmin/data(ic) , subject to: fpr\or{x) < r]pr\or ' (26) 

For convex penalties and providing that the Lagrange mul- 
tipliers (/i and £) and the thresholds (r^data and r^prior) are set 
consistently, the image restoration is achieved by solving either 
of the problems in Eq. (1231) , Eq. (l26l) or by minimizing the 
penalty function in Eq. (1251) . However, choosing which of these 
particular problems to solve can be a deciding issue for the 
efficiency of the method. For instance, if /data(^c) and /prior (^^^) 
are both smooth functions, direct minimization of f{x; fi) in 
Eq. (l25l) can be done by using general purpose optimization 
algorithms [18] but requires to know the value of the Lagrange 
multiplier. If the penalty functions are not smooth or if one 
wants to have the Lagrange multiplier automatically tuned 
given T^data or ^prior, spccific algorithms must be devised. As 
we will see in the following, specifying the image recon- 
struction as a constrained optimization problem provides a 
very general framework suitable to describe most existing 
methods; it however hides important algorithmic details about 
the strategy to search the solution. In the remaining of this 
section, we first derive expressions of the data penalty terms 
and, then, the various regularizations that have been considered 
for image reconstruction in interferometry. 

C. Distance to the Data 

The £2 norm is a simple means to measure the consistency 
of the model image with the data: 



fdataix) = \\y- A-X\ 



2 • 



(27) 



However, to account for correlations and for the inhomoge- 
neous quality of the measurements, the distance to the data 
has to be defined according to the statistics of the errors 
e = y — A ■ X given the image model. Assuming Gaussian 
statistics, this leads to: 



/data(^c) = {y-A-x)^- Wen ' (?/ - A • x) , 



(28) 



where the weighting matrix Wgrr = C^^r^ is the inverse of 
the CO variance matrix of the errors. There is a slight issue 

^Conversely, the constraint being inactive would imply that i = 0, which 
would mean that the data are useless, which is hopefully not the case... 



because we are dealing with complex values. Since complex 
numbers are just pairs of reals, complex valued vectors (such 
as e and A - x) can be flattened into ordinary real vectors 
(with doubled size) to use standard linear algebra notation and 
define the covariance matrix as Cgrr = (e • e^). This is what 
is assumed in Eq. (1281) . 

There are some possible simplifications. For instance, the 
complex visibilities are measured independently, hence the 
weighting matrix Wgrr is block diagonal with 2x2 blocks. 
Furthermore, if the real and imaginary parts of a given 
measured complex visibility are uncorrected and have the 
same variance, then /data takes a simple form: 

/data(a^) = ^^i IVi - (A • x)if , (29) 

i 

where the weights are given by: 

= Var (Re {y^))-^ = Var (Im {y^))-^ . (30) 

This expression of /data(^^^), popularized by Goodman [19], is 
very commonly used in radio-interferometry. 

Real data may however have different statistics. For in- 
stance, the OI-FITS file exchange format for optical inter- 
ferometric data assumes that the amplitude and the phase of 
complex data (complex visibility or triple product) are inde- 
pendent [20]. The thick lines in Fig. [3] display the isocontours 
of the corresponding log-likelihood which forms a non-convex 
valley in the complex plane. Assuming Goodman statistics 
would yield circular isocontours in this figure and is obviously 
a bad approximation of the true criterion in that case. To 
improve on the Goodman model while avoiding non-convex 
criteria, Meimon et al [14] have proposed quadratic convex 
approximations of the true log-likelihood (see Fig.O and have 
shown that their so-called local approximation yields the best 
results, notably when dealing with low signal to noise data. 
For a complex datum yi = pi exp(i(/9^), their local quadratic 
approximation writes: 

UM = Y.\^^2 -^^^2 -\ (31) 

I I ^//.^ ^^^^ J 

where e = y — A • x denotes the complex residuals and the 
variances along and perpendicular to the complex datum vector 
are: 

= Var(p,) (32) 
= pI Var((/:)£) . (33) 

The Goodman model is retrieved when Var((/9^) = 
Var(p£). 

D. Maximum Entropy Methods 

Maximum entropy methods (MEM) are based on the 1950s 
work of Jaynes on information theory; the underlying idea is to 
obtain the least informative image which is consistent with the 
data [21]. This amounts to minimizing a criterion like the one 
in Eq. (1251) with /prior (^c) = —S{x) where the entropy S{x) 
measures the informational contents of the image x. In this 
framework, /prior (^c) is sometimes called negentropy. Among 
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Fig. 3. Convex quadratic approximations of complex data. Thick lines: 
isocontours of the log-likelihood /data (at 1,2 and 3 rms levels) for a complex 
datum with independent amplitude and phase. Dashed lines: isocontours for 
the global quadratic approximation. Thin lines: isocontours for the local 
quadratic approximation. 

all the expressions considered for the negentropy of an image, 
one of the most popular is [22] : 

fpnorix) = [Xj \og{Xj/Xj) - Xj + Xj] (34) 

3 

with X the default image; that is, the one which would be 
recovered in the absence of any data. Back to information 
theory, this expression is similar to the Kullback-Leibler 
divergence between x and x (with additional terms that cancel 
for normalized distributions). The default image x can be 
taken as being a flat image, an image previously restored, an 
image of the same object at a lower resolution, etc. Narayan 
& Nityananda [23] reviewed maximum entropy methods for 
radio-interferometry imaging and compared the other forms 
of the negentropy that have been proposed. They argued that 
only non-quadratic priors can interpolate missing Fourier data 
and noted that such penalties also forbid negative pixel values. 
The fact that there is no need to explicitly impose positivity is 
sometimes put forward by the proponents of these methods. 

MEM penalties are usually separable, which means that 
they do not depend on the ordering of the pixels. To explicitly 
enforce some correlation between close pixels in the sought 
image x (hence, some smoothness), the prior can be chosen 
to depend on x. For instance: x = F x where P is some 
averaging or smoothing linear operator. This type of floating 
prior has been used to loosely enforce constraints such as 
radial symmetry [24]. Alternatively, an intrinsic correlation 
function (ICE) can be explicitly introduced by a convolution 
kernel to impose the correlation structure of the image [25]. 

Minimizing the joint criterion in Eq. (1251) with entropy 
regularization has a number of issues as the problem is highly 
non-linear and as the number of unknowns is very large 
(as many as there are pixels). Various methods have been 
proposed, but the most effective algorithm [26] seeks for the 
solution by a non-linear optimization in a local sub- space of 
search directions with the Lagrange multiplier p tuned on the 
fly to match the constraint that /data(^^^) = ^data- 

E. Other Prior Penalties 

Bayesian arguments can be invoked to define other types of 
regularization. Eor instance, assuming that the pixels have a 



Gaussian distribution leads to quadratic penalties such as: 

Uno.{x) = {x- xf ■ C;l^ -{x-x) (35) 

with C prior the prior covariance and x the prior solution. 
Tikhonov's regularization [27], /prior (^c) = II ^11 2' the sim- 
plest of these penalties. By Parseval's theorem, this regulariza- 
tion favors zeroes for unmeasured frequencies, it is therefore 
not recommended for image reconstruction in interferometry. 
Yet, this does not rule out all quadratic priors. For instance, 
compactness is achieved by a very simple quadratic penalty: 

Unor{x)=J2wr'xl, (36) 

n 

where the weights are increasing with the distance to the center 
of the image thus favoring structures concentrated within this 
part of the image. Under strict normalization constraint and in 
the absence of any data, the default image given by this prior is 
oc 1 / w^^^"^ where the factor comes from the normalization 
requirement [28]. Although simple, this regularizer, coupled 
with the positivity constraint, can be very effective as shown by 
Fig. |4l3. Indeed, smooth Fourier interpolation follows from the 
compactness of the brightness distribution which is imposed 
by /prior (^^^) in Eq. (l36l) and by the positivity as it plays the 
role of a floating support. 

Other prior penalties commonly used in image restoration 
methods can be useful for interferometry. For instance, edge- 
preserving smoothness is achieved by: 

/prior(a;) = Yl + |Va^|2^,„, (37) 

ni,n2 

where e > is a chosen threshold and |Virp is the squared 
magnitude of the spatial gradient of the image: 

I^*^lni,n2 (•^ni + l,n2 ^721,722) ~^ (^ni,7T,2 + l ^711,712) • 

The penalization in Eq. (l37l) behaves as a quadratic (resp. 
linear) function where the magnitude of the spatial gradient is 
small (resp. large) compared to e. Thus reduction of small local 
variations without penalizing too much strong sharp features 
is achieved by this regularization. In the limit e ^ 0, edge- 
preserving smoothness behaves like total variation [29] which 
has proved successful in imposing sparsity. 

E Clean method 

Favoring images with a limited number of significant pixels 
is a way to avoid the degeneracies of image reconstruction 
from sparse Fourier coefficients. This could be formally done 
by searching for the least ^o-norm image consistent with 
the data; hence using /prior (^c) = ll^cllo- However, due to 
the number of parameters, minimizing the resulting mixed 
criterion is a combinatorial problem which is too difficult 
to solve directly. Clean algorithm [30], [31] implements 
a matching pursuit strategy to attempt to find this kind of 
solution. The method proceeds iteratively as follows. Given 
the data in the form of a dirty image, the location of the 
brightest point source that best explains the data is searched. 
The model image is then updated by a fraction of the intensity 
of this component, and this fraction times the dirty beam is 



subtracted from the dirty image. The procedure is repeated for 
the new residual dirty image which is searched for evidence 
of another point-like source. When the level of the residuals 
becomes smaller than a given threshold set from the noise 
level, the image is convolved with the clean beam (usually a 
Gaussian shaped PSF) to set the resolution according to the 
extension of the {u^ v) coverage. Once most point sources have 
been removed, the residual dirty image is essentially due to 
the remaining extended sources which may be smooth enough 
to be insensitive to the convolution by the dirty beam. Hence 
adding the residual dirty image to the clean image produces 
a final image consisting in compact sources (convolved by 
the clean beam) plus smooth extended components. Although 
designed for point sources. Clean works rather well for 
extended sources and remains one of the preferred methods 
in radio-interferometry. 

It has been demonstrated that the matching pursuit part 
of Clean is equivalent to an iterative deconvolution with 
early stopping [32] and that it is an approximate algorithm 
for obtaining the image of minimum total flux consistent 
with the observations [33]. Hence, under the non-negativity 
constraint, this would, at best, yield the least ^i-norm image 
consistent with the data. This objective is supported by recent 
results in Compressive Sensing [34] showing that, in most 
practical cases, regularization by the ^i-norm of x enforces the 
sparsity of the solution. However, the matching pursuit strategy 
implemented by Clean is slow, it has some instabilities and 
it is known to be sub-optimal [33]. 

G. Other methods 

This section briefly reviews other image reconstruction 
methods applied in astronomical interferometry. 

1) Multi-resolution: These methods aim at reconstructing 
images with different scales. They basically rely on recursive 
decomposition of the image in low and high frequencies. 
Multi-resolution Clean [35] first reconstructs an image of the 
broad emission and then iteratively updates this map at full 
resolution as in the original Clean algorithm. This approach 
has been generalized by using a wavelet expansion to describe 
the image — which could be formally expressed in terms of 
Eq. ([T3]) — and achieved multi-resolution deconvolution by a 
matching pursuit algorithm applied to the wavelet coefficients 
and such that the solution satisfies positivity and support con- 
straints [36]. The multi-scale Clean algorithm [37] explicitly 
describes the image as a sum of components with different 
scales and makes use of a weighted matching pursuit algorithm 
to search for the scale and position of each image update. 
The main advantages of multi-scale Clean are its ability to 
leave very few structures in the final residuals and to correctly 
estimate the total flux of the observed object. This method is 
widely used in radio-astronomy and is part of standard data 
processing packages [38]. In the context of MEM, the multi- 
channel maximum entropy image reconstruction method [39] 
introduces a multi- scale structure in the image by means of 
different intrinsic correlation functions [25]. The reconstructed 
image is then the sum of several extended sources with 
different levels of correlation. This approach was extended 



by using a pyramidal image decomposition [40] or wavelet 
expansions [41], [42]. 

2) Wipe method: The Wipe method [10] is a regularized 
fit of the interferometric data under positivity and support 
constraints. The model image is given by Eq. ([T3]) using an 
equally- spaced grid and the effective resolution is explicitly 
set by the basis function h{0), the so-called neat beam, with 
an additional penalty to avoid super resolution. The image 
parameters are the ones that minimize: 

/wipe(ic) = ^wi\hyi-{A-x)i\^ ^ \{¥-x)k\^ 

i k,\Uk\>l^eff 

with y the calibrated complex visibility data, bi the Fourier 
transform of the neat beam at the spatial frequency of datum 
Vi, ^eff ^ sup^^^l^^l an effective cutoff frequency, F the 
Fourier transform operator, A the model matrix given by 
Eq. (fTSl) accounting for the sub-sampled Fourier transform 
and the neat beam, and wi = l/(cr| crj,'^) where cr| = 
\hi\^ Yd.r{yi) assuming the Goodman approximation. In the 
criterion minimized by Wipe, one can identify the distance 
of the model to the data and a regularization term. There is 
no hyper-parameter to tune the level of this latter term. The 
optimization is done by a conjugate gradient search with a 
stopping criterion derived from the analysis of the conditioning 
of the problem. This analysis is built up during the iterations. 

3) Bi-model method: The case of an image model explicitly 
mixing extended source and point sources has also been 
addressed [43], [44] and more recently [15]. The latter have 
considered an image x = Xq ^ x^ made of two maps: Xq 
for extended structures and x^ for point-like components. The 
maps Xq and x^ are respectively regularized by imposing 
smoothness and sparsity. With additional positivity and, op- 
tionally, support constraints, it turns out that the two kinds 
of regularization can be implemented by quadratic penalties. 
Their method amounts to minimize: 

/mix(a^e,a?p) = \\y - A - {Xe + Xp)f + Ag • Xp 

+ es||xp|p + Ac||xe||^^^,^^ + e^ (c^-a^e)' 

with ||iCe||c ^ l^^^l finite difference norm similar to 
Eq. (1351) . and c a vector with all components set to one, 
hence: - x = Xn- There are 4 tuning parameters for the 
regularization terms: As > and eg > control the sparsity of 
Xp, Ac > controls the level of smoothness in the extended 
map Xe, and em > (or em > if there is a support constraint) 
insures strict convexity of the regularization with respect to 
Xq. Circulant approximations are used to implement a very 
fast minimization of fm\x{xe^Xp) under the constraints that 
Xe>0 and Xp > [15]. 

H. Self-Calibration 

When the OTF cannot be calibrated (e.g. there is no refer- 
ence or the OTF is significantly varying due to the turbulence), 
the problem is not only to derive the image parameters x, 
but also the unknown complex throughputs g. As there is no 
correlation between the throughputs and the observed object. 



the inverse approach leads to solve: 

{x,g)^ = argmin{/data(a?,c/) +Mimg/pHor(^) 

with /iimg/pHor(^) /^gain /prioK^') rcgularization terms 
for the image parameters and for the complex throughputs. The 
latter can be derived from prior statistics about the turbulence 
[5]. In principle, global optimization should be required to 
minimize the non-convex criterion in Eq. (l38l) . Fortunately, a 
simpler strategy based on alternate minimization with respect 
to X only and then with respect to g only has proved effective 
to solve this problem. This method has been called self- 
calibration because it uses the current estimate of the sought 
image as a reference source to calibrate the throughputs. The 
algorithm begins with an initial image x^^^ and repeats until 
convergence the following steps (starting with n = 1 and 
incrementing n after each iteration): 

1) Self-calibration step. Given the image x^^~'^\ find the 
best complex throughputs g^^^ by solving: 

=argmin{/data(a^["-^l,g) +Mgain/pt:(9)| • 

9 ^ ^ 

2) Image reconstruction step. Apply image reconstruction 
algorithm to recover a new image estimate given the data 
and the complex throughputs: 

= argmin {/d3ta(a^,9["l) + A^img Cli^)} ■ 

Note that any image reconstruction algorithm described pre- 
viously can be used in the second step of the method. The 
criterion in Eq. (1381) being non-convex, the solution should 
depend on the initialization. Yet, this does not appear to be an 
issue in practice even if simple local optimization methods are 
used to solve the self-calibration step (such as the one recently 
proposed by Lacour et al. [45]). 

Self-calibration was initially proposed by Readhead & 
Wilkinson [46] to derive missing Fourier phase information 
from phase closure data, and the technique was later improved 
by Cotton [47]. Schwab [48] was the first to solve the 
problem by explicitly minimizing a non-linear criterion similar 
to fdata{x^g) in Eq. (|29l) . Schwab's approach was further 
improved by Cornwell & Wilkinson [49] who introduced 
priors for the complex gains, that is, the term /igain /pHoK^) 
the global penalty. However, for most authors, no priors about 
the throughputs are assumed, hence /igain = 0. 

Self-calibration is a particular case of the blind, or myopic, 
deconvolution methods [50] that have been developed to 
improve the quality of blurred images when the point spread 
function (PSF) is unknown. Indeed, when the PSF can be 
completely described by phase aberrations in the pupil plane, 
blind deconvolution amounts to solving the same problem as 
self-calibration [51]. 

IV. Image Reconstruction from Non-Linear Data 

At optical wavelengths, the complex visibilities (whether 
they are calibrated or not) are not directly measurable, the 
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Fig. 4. Image reconstruction with various types of regularization. From 
top-left to bottom-right: (a) original object smoothed to the resolution of 
the interferometer (FWHM ~ 15 mas); (b) reconstruction with a quadratic 
regularization given by Eq. (|36) and which imposes a compact field of 
view; (c) reconstruction with edge-preserving regularization as in Eq. JT/l ; 
(d) reconstruction with maximum entropy regularization as in Eq. ( 134V All 
reconstructions by algorithm MiRA and from the power spectrum and the 
phase closures. 



available data (c/ Section [III) are the power spectrum, the 
bispectrum and/or the phase closure. Image reconstruction al- 
gorithms can be designed following the same inverse problem 
approach as previously. In particular, the regularization can 
be implemented by the same /prior penalties as in Section [nil 
However, the direct model of the data is now non-linear and 
specific expressions to implement /data have to be derived. 
The non-linearity has also some incidence on the optimization 
strategy. 

A. Data Penalty 

The power spectrum, the bispectrum, and the phase closure 
data have non-Gaussian statistics: the power spectrum is a 
positive quantity, the phase closure is wrapped in (— tt, +7r], 
etc. Most algorithms however make use of quadratic penalties 
with respect to the measurements which implies Gaussian 
statistics in a Bayesian framework. Another assumption gen- 
erally made is the independence of the measurements, which 
leads to separable penalties. Under such approximations, the 
penalty with respect to the power spectrum data writes: 
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with S^^'^jf^^ix) = \i{iyj,j,^m)\^ the model of the power 
spectrum. For the penalty with respect to the bispectrum data, 
there is the additional difficulty to deal with complex data. 
The Goodman approximation [19] yields: 
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with B^^jf^^^^^ix) = i{iyj,^j,^m)i{iyj2,33,m)i{l^js,3um) the 

model of the bispectrum and weights derived from the variance 



of the bispectrum data. An expression similar to that in 
Eq. (|3T1) can be derived for bispectrum data with independent 
modulus and phase errors. To account for phase wrapping, 
Haniff [52] proposed to define the penalty with respect to the 
phase closure data as: 
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the model of the phase closure. This penalty is however not 
continuously differentiable with respect to x, which can pre- 
vent the convergence of optimization algorithms. This problem 
can be avoided by using the complex phasors [53]: 



/dita(^) — ^ 



• /omodel 



idata 

ji,j2j3,rnj 



^ , (42) 

var ( n^"^^ . i 

mji<j2<j3 

which is approximately equal to the penalty in Eq. (|4T1) in the 
limit of small phase closure errors. 

Depending on which set of data is available, and assuming 
that the different types of data have statistically independent 
errors, the total penalty with respect to the data is simply a 
sum of some of the penalties given by equations ([39l)-(l42l). 
For instance, to fit the power spectrum and the phase closure 
data: 

f6M=r,Ux)+fg,,{x). (43) 

B. Image Reconstruction Algorithms 

We describe here the image reconstruction methods used 
with some success on realistic optical interferometric data in 
astronomy and which can be considered as ready to process 
real data. In addition to cope with sparse Fourier data, these 
methods were specifically designed to tackle the non-linear 
direct model, to account for the particular statistics of the data 
[14] and to handle the new data format [20]. These image re- 
construction methods can all be formally described in terms of 
a criterion to optimize, perhaps under some strict constraints, 
and an optimization strategy. Some of these algorithms have 
clearly inherited from methods previously developed: BSMEM 
[54], the Building-Block Method [9] and WiSARD [55] are 
respectively related to MEM (c/ Section IIII-DI) . Clean (cf. 
Section UlI-FI) and self-calibration (cf. Section Ull-Hb . 

1) BSMEM algorithm [54], [56] makes use of a Maximum 
Entropy Method (cf. Section IIII-DI) to regularize the problem 
of image restoration from the measured bispectrum (hence 
its name). The improved BSMEM version [56] uses the Gull 
and Skilling entropy, see Eq. (l34l) . and a likelihood term with 
respect to the complex bispectrum which assumes independent 
Gaussian noise statistics for the amplitude and phase of the 
measured bispectrum. The optimization engine is MemSys 
which implements the strategy proposed by Skilling & Bryan 
[26] and automatically finds the most likely value for the 
hyper-parameter /i. The default image is either a Gaussian, a 
uniform disk, or a Dirac centered in the field of view. Because 
it makes no attempt to directly convert the data into complex 
visibilities, a strength of BSMEM is that it can handle any type 
of data sparsity (such as missing closures). Thus, in principle. 



BSMEM could be used to restore images when Fourier phase 
data are completely missing (see Fig. [5]). 

2) The Building Block Method [9] is similar to the Clean 
method but designed for reconstructing images from bispec- 
trum data obtained by means of speckle or long baseline 
interferometry. The method proceeds iteratively to reduce a 
cost function f^^^^^ equal to that in Eq. (l4Ql) with weights set to 
a constant or to an expression motivated by Wiener filtering. 
The minimization of the penalty is achieved by a matching 
pursuit algorithm which imposes sparsity of the solution. The 
image is given by the building block model in Eq.'s ([T3l) - ([T4b 
and, at the n-th iteration, the new image /[^] {6) is obtained by 
adding a new building block at location O^^'^ with a weight a^'^^ 
to the previous image, so as to maintain the normalization: 

/[^](6/) = (1 - Qft^]) /[^-^](6/) + b{0 - 6/[^]) . 

The weight and location of the new building block is de- 
rived by minimizing the criterion f^^^^^ with respect to these 
parameters. Strict positivity and support constraint can be 
trivially enforced by limiting the possible values for a^^'^ 
and 6/[^l To improve the convergence, the method allows 
to add/remove more than one block at a time. To avoid 
super resolution artifacts, the final image is convolved with 
a smoothing function with size set according to the spatial 
resolution of the instrument. 

3) MACIM algorithm [57], for MArkov Chain IMager, aims 
at maximizing the posterior probability: 

Vl{x\y) (X exp(^-i /data(a^) - | /prior(a^)) . 

MACIM implements MEM regularization and a specific reg- 
ularizer which favors large regions of dark space in-between 
bright regions. For this latter regularization, /prior (a^) is the 
sum of all pixels with zero flux on either side of their 
boundaries. MACIM attempts to maximize Vi:{x\y) by a 
simulated annealing algorithm with the Metropolis sampler. 
Although maximizing Vr(x\y) is the same as minimizing 
/data(3^) + M /prior (a^), the usc of normalized probabilities is 
required by the Metropolis sampler to accept or reject the 
image samples. In principle, simulated annealing is able to 
solve the global optimization problem of maximizing Vr(x\y) 
but the convergence of this kind of Monte-Carlo method for 
such a large problem is very slow and critically depends on 
the parameters which define the temperature reduction law. A 
strict Bayesian approach can also be exploited to derive, in a 
statistical sense, the values of the hyper-parameters (such as 
li) and some a posteriori information such as the significance 
level of the image. 

4) MiRA algorithm [53] defines the sought image as the 
minimum of the penalty function in Eq. (1251) . Minimization is 
done by a limited variable memory method (based on BEGS 
updates) with bound constraints for the positivity [58]. Since 
this method does not implement global optimization, the image 
restored by MiRA depends on the initial image. MiRA is 
written in a modular way: any type of data can be taken into 
account by providing a function that computes the correspond- 
ing penalty and its gradient. For the moment, MiRA handles 
complex visibility, power spectrum and closure-phase data via 
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Fig. 5. Image reconstruction with phase closure (left) and without any Fourier 
phase information (right). 



penalty terms given by Eq. ([3T]) , Eq. ([39l) and Eq. (l42l) . Also 
many different regularizers are built into MiRA (negentropy, 
quadratic or edge-preserving smoothness, compactness, total 
variation, etc.) and provisions are made to implement custom 
priors. MiRA can cope with any missing data, in particular, it 
can be used to restore an image given only the power spectrum 
{i.e. without any Fourier phase information) with at least a 
180° orientation ambiguity. An example of reconstruction with 
no phase data is shown in Fig. [5] In the case of non- sparse 
{u^v) coverage, the problem of image reconstruction from 
the modulus of its Fourier transform has been addressed by 
Fienup [59] by means of an algorithm based on projections 
onto convex sets (POCS). 

5) Wisard algorithm [55] recovers an image from power 
spectrum and phase closure data. It exploits a self-calibration 
approach (cf. Section Ull-Hb to recover missing Fourier phases. 
Given a current estimate of the image and the phase closure 
data, Wisard first derives missing Fourier phase information 
in such a way as to minimize the number of unknowns. Then, 
the synthesized Fourier phases are combined with the square 
root of the measured power spectrum to generate pseudo com- 
plex visibility data which are fitted by the image restoration 
step. This step is performed by using the chosen regularization 
and a penalty with respect to the pseudo complex visibility 
data. However, to account for a more realistic approximation 
of the distribution of complex visibility errors, WiSARD make 
uses of a quadratic penalty which is different from the usual 
Goodman approximation [14]. Taken separately, the image 
restoration step is a convex problem with a unique solution, the 
self-calibration step is not strictly convex but (like in original 
self-calibration method) does not seem to pose insurmountable 
problems. Nevertheless, the global problem is multi-modal 
and, at least in difficult cases, the final solution depends on the 
initial guess. There are many possible regularizers built into 
Wisard, such as the one in Eq. (l36l) and the edge-preserving 
smoothness prior in Eq. (IJTI) . 

MiRA and WiSARD have been developed in parallel and 
share some common features. They use the same optimization 
engine [58] and means to impose positivity and normalization 
[28]. They however differ in the way missing data is taken into 
account: Wisard takes a self-calibration (cf. Section IIII-Hb 
approach to explicitly solve for missing Fourier phase in- 
formation; while MiRA implicitly accounts for any lack of 
information through the direct model of the data [28]. 

All these algorithms have been compared on simulated 
data during Interferometric Beauty Contests [16], [60], [61]. 
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Fig. 6. Image reconstruction under various regularization levels. Algorithm is 
MiRA with edge-preserving regularization given in Eq. (3l\ with e = 10 ~^ 
and 12 = 10^ (top-left), /i = 10^ (top-right), = 10^ (bottom-left) and 
/i = 3 X 10^ (bottom-right). 



The results of the contest were very encouraging. Although 
being quite different algorithms, BSMEM, the Building Block 
Method, MiRA and WiSARD give good image reconstructions 
where the main features of the objects of interest can be 
identified in spite of the sparse (u^v) coverage, the lack of 
some Fourier phase information and the non-linearities of the 
measurements. BSMEM and MiRA appear to be the most 
successful algorithms (they respectively won the first two and 
last contests). 

With their tuning parameters and, for some of them, the 
requirement to start with an initial image, these algorithms 
still need some expertise to be used successfully. But this is 
quite manageable. For instance, the tuning of the regularization 
level can be derived from Bayesian considerations but can also 
almost be done by visual inspection of the restored image. 
From Fig. [6l one can see the effects of under-regularization 
(which yields more artifacts) and over-regularization (which 
yields over simplification of the image). In that case, a good 
regularization level is probably between /i = 10^ and /i = 10^ 
and any choice in this range would give a good image. Figured 
shows image reconstructions from one of the data sets of 
the 2004' Beauty Contest [16] and with different types of 
regularization. These synthesized images do not greatly differ 
and are all quite acceptable approximations of the reality 
(compare for instance with the dirty image in Fig. O. Hence, 
provided that the level of the priors is correctly set, the 
particular choice of a given regularizer can be seen as a 
refinement that can be done after some reconstruction attempts 
with a prior that is simpler to tune. At least, the qualitative 
type of prior is what really matters, not the specific expression 
of the penalty imposing the prior. 

V. Discussion 

The main issues in image reconstruction from interferomet- 
ric data are the sparsity of the measurements (which sample 
the Fourier transform of the object brightness distribution) and 



the lack of part of the Fourier phase information. The inverse 
problem approach appears to be suitable to describe the most 
important existing algorithms in this context. Indeed, the image 
reconstruction methods can be stated as the minimization of a 
mixed criterion under some strict constraints such as positivity 
and normalization. Two different types of terms appear into 
this criterion: likelihood terms which enforce consistency of 
the model image with the data, and regularization terms which 
maintain the image close to the priors required to lever the 
degeneracies of the image reconstruction problem. Hence, the 
differences between the various algorithms lie in the kind of 
measurements considered, in the approximations for the direct 
model and for the statistics of the errors and in the prior 
imposed by the regularization. For non-convex criteria which 
occur when the OTF is unknown or when non-linear estima- 
tors are measured to overcome turbulence effects, the initial 
solution and the optimization strategy are also key components 
of the algorithms. Although global optimization is required to 
solve such multi-modal problems, most existing algorithms are 
successful whereas they only implement local optimization. 
These algorithms are not fully automated black boxes: at least 
some tuning parameters and the type of regularization are 
left to the user choice. Available methods are however now 
ready for image reconstruction from real data. Nevertheless, 
a general understanding of the mechanisms involved in image 
restoration algorithms is mandatory to correctly use these 
methods and to analyze possible artifacts in the synthesized 
images. From a technical point of view, future developments 
of these algorithms will certainly focus on global optimization 
and unsupervised reconstruction. However, to fully exploit the 
existing instruments, the most worthwhile tracks to investigate 
are multi- spectral imaging and accounting for additional data 
such as a low resolution image of the observed object to 
overcome the lack of short baselines. 
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